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Abstract 


Problem of uncertainty of graph structure identification in random 
variable network is considered. An approach for the construction of up- 
per and lower confidence bounds for graph structures is developed. This 
approach is applied for the construction of upper and lower confidence 
bounds for the threshold similarity graph. Stability of confidence bounds 
and gaps between upper and lower confidence bounds are investigated. 
Theoretical results are illustrated by numerical experiments. 
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1 Introduction 


Random variable network is a general model related with biological and medical studies 
[2], gene expession or gene co-expression analysis [8],[16], market network analysis [4], 
[18], climate network analysis [26] and others. Different graph structures are used to 
emphasize some important information in network. Simple and popular graph struc- 
ture in random variable network is a threshold similarity graph. Threshold similarity 
graph emphasize a strengths and topology of connections in the network, and it is 
known as market graph in market network analysis [4]. 

One of the most important problem related with graph structures is uncertainty 
of it’s identification by observations [18]. The main goal of the paper is the develop- 
ment of an approach for construction by observations of upper and lower confidence 
bounds for threshold similarity graph. These bounds allows to make significant con- 
clusions on edges inclusions in threshold similarity graph. Stability of these confidence 
bounds and gaps (zones of uncertainty) between upper and lower confidence bounds 
are investigated as well. 

Problem of identification by observations of a subset of random variables with 
some specific properties from a given set of random variables was considered in [13]. 
For the solution of this problem it is proposed in [13] an approach to construct an 
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upper confidence bound (in our terminology) for the desired subset. We develop this 
approach to the problem of graph structure identification, and in particular to the 
threshold similarity graph identification problem. In contrast to [13] we construct not 
only upper confidence bound but simultaneously upper and lower confidence bounds 
for graph structures. We show that to construct upper and lower confidence bounds 
one can use multiple testing statistical procedures with FWER (Family Wise Error 
Rate) control in strong sense [15], [5]. Concerning upper confidence bound construction 
our approach is close to [11]. The problem of two bounds construction for Gaussian 
concentration graph identification has been considered in [7] and developed in [9]. The 
method in [7] was based on simultaneous construction of confidence intervals for the 
elements of precision (correlation) matrices for multivariate normal distribution. The 
set of such simultaneous confidence intervals leads to lower confidence bound (in our 
terminology) for all non zeros elements of precision (correlation) matrices. However, 
upper confidence bounds construction in [7] was based on some subjective arguments 
only. 

We propose more general approach based on hypotheses testing theory and prove 
that threshold similarity graph in any random variable network is included in our upper 
confidence bound and includes our lower confidence bound with a given probability P* 
simultaneously. Furthermore we investigate a gap between upper and lower confidence 
bounds which could be used to characterize uncertainty of identification procedure. 
To understand limits of our approach we investigate in details stability and size of gap 
in three networks: Pearson correlation network, Kendall correlation network and sign 
similarity network [18]. 

The article is organized as follows: in Section 2 basic definitions and notations are 
given; in Section 3 we describe our approach for upper confidence bounds construction 
based on multiple hypotheses testing procedures with FWER control; in Section 4 we 
describe our approach for lower confidence bounds construction; in Section 5 a pro- 
cedure for simultaneous upper and lower confidence bounds construction is proposed; 
in Section 6 a gap between upper and lower confidence bounds is introduced and dis- 
cussed; in Section 7 three widely used random variable networks are investigated in 
details; in Section 8 we investigate stability and size of the gap by simulations; in 
Section 9 the conclusions are given. 


2 Problem statement and notations 


Let X = (X1,..., Xn) be a random vector with distribution f(x; @),0 € Q where Q is 
a parameter space, x € R. Let ¥i,;(@) = y(Xi, X;) is a measure of similarity between 
Xi, Xj, i,j =1,...,N. For simplicity of notations sometimes we will omit index @ if 
it does not lead to the confusion. As a measure of similarity Pearson correlation, sign 
similarity and Kendall correlation will be considered. According to [18] a pair (X, 7) 
will be called random variable network. 

The matrix 


1 1,2 «++ YN 
r= | mo ono (1) 
YN,1 YN, 2 +: 1 


describe all pairwise similarities between components of the vector X. The random 
variable network generates a network model. Network model for random variable 
network (X, y) is a complete weighted graph (V, T), where V = {1,2,..., N} is the set 


of nodes, I = (%,;) is the matrix of weights, yi j = y(Xi, Xj) i Æ j,i, j =1,...,N. 
This complete weighted graph we will call true network model. Threshold similarity 
graph in the network model (V,I) is unweighted graph (V, E), where (i,j) € E if and 
only if yi; > yo [18]. This threshold similarity graph will be called true threshold 
similarity graph. If 7;,; is measured by correlation then the threshold similarity graph 
could be called correlation graph [25]. 

Let us introduce the following notations. 

J = {(i,j):t < j;i,j =1,..., N} is a set of indexes pairs, 


N(N — 1) 


m=] ==; 


where by |A| we denote the number of elements of the set A. 

Jel, Yo) = { (i, j): Vij > Yo,i < j} is a set of edges of the true threshold similarity 
graph, 

Jn(0, Yo) = { (i, j): Yij < Yo, i < j} is a set of pairs of vertexes of the true threshold 
similarity graph without edges. 

Let x = (x;(t)), i =1,...,N;t =1,...,n be a sample of the size n from distribution 
of the vector X. The problem of threshold similarity graph identification is to identify 
the true threshold similarity graph by observations. The problem of uncertainty of 
threshold similarity graph identification is discussed in [18]. It is shown that the 
uncertainty could be high. To handle this uncertainty we propose to construct two 
sets of edges L(x) and U(x) satisfying the condition 


Po(L(x) C Je(0, y0) C U(x) > P*, VOEQ (2) 


These sets U (x), L(x) will be called simultaneous upper and lower confidence bounds 
of level P* (confidence probability) for the true threshold similarity graph Je(0, yo). 

First we consider the problems of upper and lower confidence bounds for Je(0, Yo) 
identification separately. 


3 Upper confidence bounds 
A set U(x) C J satisfying 
Po(U(x) D Je(0,0)) > P* VOED (3) 


will be called upper confidence bound of level P* for Je(0, Yo). 

Now we show that the problem of upper confidence bound construction could 
be considered from multiple hypotheses testing point of view. Consider the set of 
individual hypotheses 


hi j: Tij > Yo versus ki j: Yij < Yo, i<j (4) 


Standard tests for testing hypotheses (4) have the form 


ern] 1, Tigla) < cj 
pi,j(z) = { 0, Ti,;(x) > ef (5) 


where cj; is defined from 


Lemma 3.1. Condition (3) is equivalent to the condition 


Po I] G-yis@)=1) > P*.veca (6) 
(i,j) € Je (9,70) 


The proof of the Lemma 3.1 is given in Appendix. 

Let set of tests yf j;(x),i,7 = 1,...,N,i < j be a multiple hypotheses testing 
procedure for testing hypotheses (4) with FWER control in strong sense at level a = 
1 — P* i.e. 


Pp I] G-¥is@)=0] <1-P*,woea 
(4,9) €Je (9,70) 

As follows from Lemma 3.1 if set of tests yf ;(x) satisfy condition (6) then the set 
UCB(a, yo, P*) = {(2, j): p7; (x) = 0} could be used as upper confidence bound for 
Je(0, y0). In the case UCB(a, yo, P*) contain all pairs (i, j), i < j such that hypotheses 
hi j: Jij > Yo are accepted by any multiple hypotheses testing procedure with FWER 
control at level a = 1 — P* in strong sense. 

In what follows we keep two notations P* and a (where P* = 1— a and a is 
multiple level of significance of corresponding multiple hypotheses testing procedure) 
for convenience. 


4 Lower confidence bounds 


A set L(x) C J satisfying 
Po(L(a) C Je(6,40)) > P*, WOE A (7) 


will be called lower confidence bound of level P* for Je(0, y0). 
Now we show that to construct lower confidence bound one can use multiple hy- 
potheses testing procedures under appropriate individual hypotheses formulation. 
Consider the set of individual hypotheses 


hij: Vig < Yo versus kij: Yij > Yo, i<j (8) 
Standard tests for testing individual hypotheses (8) have the form 


morae 1, Taa) 
Pig (@) = { 0, Tij(a) < ckj (9) 


where c;; is defined from 

Pyo (Ti,j(@) > cij) = aij 
Lemma 4.1. If set of tests y7;(x) is a multiple hypotheses testing procedure with 
FWER control at level a = 1 — P* in strong sense then the set LCB(x,y0,P*) = 


{(i,9): g23(x) = 1, i< j} could be used as lower confidence bound of level P* for 
Je(0, %0). 


The proof of the Lemma 4.1 is given in Appendix. 

Therefore LCB(z, yo, P*) could be constructed by any multiple hypotheses testing 
procedure for testing hypotheses (8) with FWER control in strong sense at level a = 
1— P*. 


Corollary 4.1. LCB(x,70,P*) contain at least one pair (i, j): 7,3 < yo with proba- 
bility <1—P*. 


5 Simultaneous upper and lower confidence bounds 
Recall that sets U(x) and L(x) satisfying 
Po(L(x) C Je(0, y0) C U(ax)) > P*, VOEO (10) 


are called simultaneous upper and lower confidence bounds of level P* for Je(0, yo). 
For construction simultaneous upper and lower confidence bounds of level P* for 
Je(9,7o) one can use following simple relations: 


Pa (U(x) D Je(9, y0) D L(a)) = 


=1— Po({U(x) DB Je(0,y0)} U {Je(0, y0) B L()}) 
If Po(U(x) D Je(9,70)) > Pt, Po(Je(O, yo) D L(x)) > Pz then 


Po(U(x) D Je(0, %0) D L(x)) > P + PX -1 


Therefore most simple way to derive simultaneous upper and lower confidence 
bounds of level P* for Je(0, yo) is to construct UCB(2, yo, P) and LCB(z, yo, Pz) 
Pty 


separately under condition P* = Př + Pš — 1. In particular case Př = Pš = == = 


1— $. It is obvious that to construct UCB(2, yo, PY) and LCB(z, yo, P3) separately 
one can use any multiple hypotheses testing procedures with FWER control in strong 
sense at levels $. It is easy to see that |UCB(2, y0, pet) > |UCB(z, yo, P*)| and 
|LCB(2, yo, ptt) < |LCB(az, yo, P*)|. This means that the gap between upper and 
lower confidence bounds which could be used to characterize uncertainty is increased. 

Another way to construct simultaneous upper and lower confidence bounds could 
be based on application of stepdown procedures to multiple testing of hypotheses hg; 
and hi j, i,j =1,...,.N. In this case, Holm’s procedure is of particular interest as most 
rejective multiple hypotheses testing procedure with FWER control in strong sense in 
the class of stepdown monotone procedures [12]. In this regard, one can pay attention 
to the modified Holm procedure proposed in [3]. Nice review of the approaches to 
considered type of problem is given in [23]. 

At the same time, both of these approaches do not take into account the specificity 
of the problem of simultaneous testing of hypotheses hj; and hj; i,j =1,...,.N. This 
specificity lies in the fact that among 2M hypotheses h{;, h?; the number of true 
hypotheses is equal to M. This allows to use Bonferroni procedure with individual 
levels a/M unlike a/2M, that lead to not least number of rejected hypotheses with 
respect to Holm procedure and modified Holm procedure. This surprising result was 
emphasized in [3]. 

This lead to the following Bonferroni type procedure. 

Bonferroni type procedure: Test hypotheses hj; and hfj, i,j = 1,...,N on the 
same level . Pairs (i,j) corresponding to accepted hypotheses hf; are included 
to the UCB(, Yo, P*) and pairs (i,j) corresponding to rejected hypotheses hj’; are 
included to the LCB(z, yo, P*). 


For Bonferroni type procedure critical values cf}, cj, are defined from 
n e Q 
Pys (Ti,j(&) > cij) = Pyo (Tig (2) < cij) = M (11) 


If & < į (that is typical) then cf; < c}; and by construction 


LCB(x, yo, P*) C UCB(z, yo, P*). 


It is obvious that Bonferroni type procedure lead to the sets LCB(, yo, P*) and 
UCB(a, yo, P*) satisfying 


Po(LCB(2x, 0, P*) C Je(0, yo) C UCB(2, yo, P*)) > P*,VOEQ 


Note that proposed methodology does not depend from concrete multiple testing 
procedures. 


6 Gap between upper and lower confidence bounds 
for Bonferroni type procedure 


Construction of UCB(a,70, P*) and LCB(zx, yo, P*) lead to partitioning of the set 
J = {(t,j):t < j,i, j = 1,...,N} on three parts. The set LCB(z, Yo, P*) containing 
all pairs (2,7) such that hypotheses h?.;:7i,; < yo are rejected is the set of all reliable 
or significant edges. The set J \ UCB(x, yo, P*) containing all pairs (i, j) such that 
hypotheses hj j: i,j > yo are rejected is the set of all reliable or significant pairs of 
vertices without edges (significant no edges). The set UCB(x, yo, P*) \ LCB(z, yo, P*) 
containing all pairs (i, j) such that both hypotheses h?;: 7,3 < yo and hj: Vij > Yo 
are accepted simultaneously is the set of all pairs (i, j) which does not allow to make 
significant conclusions on true threshold similarity graph for given P* (indetermined 
set or zone of uncertainty). Note that in [7], [9] terminology ’significant set’, ’indeter- 
mined set’ and nonsignificant set’ was used in similar meaning. However separation 
of the entire set of obtained p-values for testing hypotheses on edges inclusion was 
based on visual subjective grounds. 

Let us emphasize that UCB(zx, yo, P*) and LCB(z, yo, P*) construction can be 
applied to separate all conclusions on the true threshold similarity graph by two type: 
significant conclusions and nonsignificant conclusions. In this sense uncertainty of the 
set of all obtained conclusions depends from the gap between upper and lower bounds. 

One can consider several concepts of the gap. In the article we consider the most 
obvious concept, namely 


UCB(z2, yo, P*) \ LCB(z, yo, P*). (12) 


Size of (12) characterize size of zone of uncertainty. We will use the expectation of 
the gap size to evaluate uncertainty of our conclusion about true threshold similarity 
graph, namely measure of uncertainty is equal to 


Ue = Eo(\UCB(z, y0, P*) \ LCB(z, yo, P*)|). (13) 


As follows from (11) for Bonferroni type procedure under the case +; < Z measure 
of uncertainty (13) is equal to 


Ue = X Poly < Tigle) < c) (14) 


It is possible to calculate explicitly the gap (13) for some special cases of matrix 
(1). Let matrix (1) be the matrix of intraclass type (yi = 1,i = 1,..., N, ij = 
yo, Vi Æ j;i,j = 1,..., N). In this case for threshold similarity graph with threshold 
yo one has 


e m € n 2a 
Pry (pi, j (2) = 0, piz (x) =0)= Pray (Cig < Ti (£) < cij) =1- M` 


Therefore 


P ‘ 2(1 — P* 7 
By (\UCB(x, 70, P*) — LCB(x, %0, P*)|) = M (: = ) =M- 2X(1- P*) 
This case corresponds to maximal uncertainty. 
Let 
Ko = {(i, j) such that yi; = yo}, Kı = {(i, j) such that y%,; 4 yo}. 
Let 


bij = Py; (pile) =0, pig (x) = 0). 


Then for threshold similarity graph with threshold yo 


2a 


V(i,7) E€ Ko one has Bij = 1— a 


It follows 


M 


Eg (|UCB(zx, yo, P*) — LCB(z, yo, P*)|) = | Kol (: eos Po) + 5 Bij (15) 


(i)EK 


From (15) one can obtain the following: 


1. 


If tests yf ;(x), y7;(x) are consistent then for n — œ uncertainty is defined by 
Ko, M and P*. 


. If N > on, tests yf ;(x), p72; (x) are consistent and n — oo then uncertainty is 


defined by Ko only. 


. For the case n — oo uncertainty approaches to 0 only for the case |Ko| = 0. 


Note that in [24] the case 7:,; = yo was considered as unrealistic. 


. If the exists uniformly most powerful tests yf ;(x), y7;(x) of level and dis- 


tribution of statistics T;,;(a) of these tests is symmetric then uncertainty (15) 
of the corresponding Bonferroni type procedure is minimal since 5 GjEK Big 
is minimal in the class of tests of level Æ. This results follows from [22]. De- 
spite from the fact that such tests exist only in rare case this trivial note could 
be useful for having idea on size of minimal gap obtaining by Bonferroni type 
procedure. 


Three random variable networks in the class 
of elliptically contoured distributions 


To understand limits of our approach let us consider the wide class of elliptically 
contoured distributions popular in economical applications [14], [18]. The random 
vector X = (Xi, X2,..., Xn) has elliptically contoured distribution EC Dx (n, A, g) 
[1] if its density function has the form: 


f(z; u, A) = |A|? g{ (E — p)' ATH (e — w)} (16) 


where A is a positive definite matrix, function g(x) > 0, and 


yi =) g(y'y)dyı ... dyp = 1 


This class includes in particular the multivariate normal distribution and the mul- 
tivariate Student distribution. It is also known that the expectations E(X;) = pi, 
i = 1,2,..., N if it exists. 

In order to obtain more detailed results concerning stability of confidence level 
P* and value of the measure of uncertainty (14) we consider three random variable 
networks [18]: 


e Pearson correlation network (X, y?) with elliptically contoured distribution 
where vector X = (Xi, X2,..., Xn) has elliptically contoured distribution and 


HO CN RUO hare D(X;) is the variance of X;; 
VD(Xi)D(X;) 


P 
measure Yi j = pij = 


e Sign similarity (or Fechner correlation) network (X,7°%) with elliptically con- 
toured distribution where vector X = (X1,X2,...,Xw) has elliptically con- 
eee ie S ij 
toured distribution and measure y? } = p’ = P((Xi — mi)(X; — uj) > 0) (for 
Fechner correlation network (X, y7”) one has qf? = 27 zel) 


e Kendall correlation network (X,7*%) with elliptically contoured distribution 
where vector X = (Xi, X2,..., Xn) has elliptically contoured distribution and 


measure yë} = Ti j = 2P [(Xi(t) — Xi(s))(X;(t) — Xj(s)) > 0]—1 where ( X(t) 
X;i(s) ‘ ; X; 
are independent copies of vector . 
( X;(s) ) j ( Xj ) 


Sign similarity networks with elliptically contoured distribution was investigated 
in [17]. It was shown that 


off = pid = P(X: — ma)(Xj — Hs) > 0) = 5 + T arcsin vë; (17) 
Therefore a true threshold similarity graph constructed in the sign similarity network 
with elliptically contoured distribution with a threshold p° coincides with a true thresh- 
old similarity graph constructed in the Pearson correlation network with elliptically 
contoured distribution with a threshold po, if p? = Z + Ł arcsin po. 

In [10] it was proved that 7:,; = 2p’? — 1 in the class of elliptically contoured 
distributions EC Dn (u, A, g). Therefore, a true threshold similarity graph constructed 
in the Fechner correlation network (X,7"") with elliptically contoured distribution 
with a threshold yo coincides with a true threshold similarity graph constructed in the 
Kendall correlation network (X, 7%“) with elliptically contoured distribution with the 
same threshold. 

It follows that these true threshold similarity graphs in these random variable 
networks are equivalent under corresponding thresholds. 

Now we provide statistics of standard tests for testing individual hypotheses (4), 
(8) 


In Pearson correlation network measure y is defined by Pearson correlation yE j= 
pi,j- Standard level-a tests [1] for individual hypotheses testing (4), (8) in the Pearson 
correlation network with normal distribution have the form (5), (9) where 


Rt ? (in (1#) in (14) ), (18) 


1— rij 1- yè 


mae Deni (elt) — De) 
"SS (i) — Fi)? (a0 — 5 


For the case n > N,n — œ constant cj; is an a-quantile of N(0,1), constant c}; is 
an (1 — a)-quantile of N(0, 1). 
It is obvious that under pi; = yò + 6 statistic TE; has asymptotic normal distri- 


on 
N ( noe (in (5) In (4 )) , 1) . Then for calculations of gap (15) one can 
use table of horal. distribution (see example 11.1 in Appendix). 

Note that in Pearson correlation network with normal distribution for individual 
hypotheses testing (4), (8) one can apply tests of the form (5), (9) which are based 
on statistics r;,;. It is known that tests (5), (9) based on statistics r;,; are uniformly 
most powerful in the class of invariant with respect to shift/scale transformation for 
normal distribution [22]. However the test does not control significance level under 
deviation from normality [19]. Since results obtained using tests (5), (9) based on 
statistics r;,; or statistics Ti; are almost coincide [1] for n > co and n > N regime 
in simulation study of stability of Bonferroni type procedure we will use tests (5), (9) 
based on statistics Ti; which are more convenient for practical applications. 

In sign similarity network measure y is defined by H =p’) = P((Xi — m)( X; 
Lj) > 0). Level a tests for individual hypotheses testing (4), (8) in the sign similarity 
network with elliptically contoured distribution have the form (5), (9) where 


Tij = TF? = ae (19) 
t=1: 


0, otherwise ý 


Ist = { 1, (ai(t) — wa) (z(t) — wy) > 0 


cî j is an a—quantile of binomial distribution b(n, p°), c}; is an (1—a)—quantile of bino- 
mial distribution b(n, p°). Under the case p’ = p? +ô statistic Tes has binomial distri- 
bution b(n, p°+6) and one can apply normal approximation N (n(p° + 6),n(p° + 5)(1 — p° — 5)). 
As follows from [17] power functions §:,;,¢ Æ j;i, j = 1,...,N do not depend from 
function g in the class EC D(p, A, g). 
Level a tests for individual hypotheses testing (4), (8) in the Kendall correlation 
network have the form (5), (9) where statistics 


T pKa ee — Yo 
ij ij = 
4\/ Pec — P2 
~Kd 1 Z . Kd 
ij 7 n(n — T) ya g (t, s), 
ae oe 
t#s 


way, oy J L (alt) — 2x(s))(0j() — zls) > 0 
Bab =d 0, (ai(t) — ei(s))(aj(t) — a;(s)) <0 ° 


Pee is the estimation of the probability that for any three pairs of observations, the 
second and third are concordant with the first (concordance with respect to the order), 
Ê? = (EF +1)/2. For n > œ statistic T? has standard normal distribution N (0, 1) 
[20]. Note that in general case (when hypothesis of independence is not true) stability 
properties of tests (5), (9) based on statistic T}? are not clear. 


8 Simulation results 


In the section we study by numerical simulations stability of confidence probability 
P* and the size of gap between upper and lower confidence bounds constructed by 
Bonferroni type procedure in the three random variable networks. 

The investigation is based on simulation of observations from a mixture distribution 
of the form: 

Fmix(%) = (1 — €) foauss (z; u, A) + efst,k (a; p, A), (20) 
where fgauss(x) is the N-dimensional normal distribution and fst,%(x) is the N-dimensional 
Student distribution with k = 3 degrees of freedom. The distribution belong to the 
class of elliptically contoured distributions. 

As true network model we consider network model with N = 30 nodes correspond- 
ing to stocks from Dow-Jones index. As true matrix A we consider correlation matrix. 
Matrix A and vector u are calculated by observations on stocks returns for 2021 year. 
Histogram of Pearson correlations is presented on the Figure 1. The histogram reflect 
the distribution of weights of edges in true Pearson network model. Distributions of 
weights of edges in true Kendall and sign network models could be calculated using 
(17). In our experiments we apply transformations (17) for calculations of the thresh- 
olds in different networks. Therefore true threshold similarity graphs are coincide. 
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Figure 1: Histogram of Pearson correlations of Dow-Jones stocks for 2021 year 


Algorithm for numerical investigation is the following: 
1. Choose the value of e (e = 0,0.1,0.2,...,1). 


2. Generate n(n > N) N-dimensional (N = 30) random vectors with the mixture 
distribution (20). 


3. Construct simultaneous upper and lower bounds for edges using Bonferroni type 


procedure with constant cj j, c}; defined from equation (11). For Pearson and 


a 


Kendall networks cj; is an -quantile of N(0;1), c; is an (1 — $ )-quantile of 


10 


N(0;1). For sign network cj; is an -quantile of binomial distribution b(n; p°), 

cf, is an (1 — &)-quantile of binomial distribution b(n; p°). 

4. In order to approximate empirical coverage probability (under nominal value 
P*) and estimate mean size of the upper confidence bound, mean size of the 
lower confidence bound and mean size of the gap the experiment is repeated 200 
times for any €. 


The algorithm was used for different number of observations n. 


8.1 Stability of confidence probability 


In order to investigate stability of confidence probability we select nominal value P*. 
In our experiments P* = 0.9 what is typical. Then we simulate observations from 
mixture distribution (20), estimate empirical coverage probability and compare it with 
nominal value. Analysis of obtained results is provided using figures below. 

On the Figure 2, (left) dependence of empirical coverage probability from e for 
yo = 0.2,n = 250 for considered measures of similarity is presented. On the Figure 
2, (right) dependence of empirical coverage probability from e for yo = 0.2,n = 1000, 
for considered measures of similarity is presented. On the Figure 3, (left) dependence 
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Figure 2: Dependence of empirical coverage probability from e (mixture=100e). 
Yo = 0.2. Left - n = 250, right - n = 1000. 


of empirical coverage probability from e for yo = 0.4, n = 250 for considered measures 
of similarity is presented. On the Figure 3, (right) dependence of empirical coverage 
probability from e for yo = 0.4,n = 1000, for considered measures of similarity is 
presented. 

Results of the experiments presented on figures 2,3 shows that nominal level P* is 
controlled for Bonferroni type procedure based on Kendall and sign similarity measures 
for chosen value of mixture parameter € and number of observations n. At the same 
time empirical confidence probability is very close to 1. This fact could be explained by 
roughness of Bonferroni inequality. Important fact obtained by simulations is essential 
dependence of empirical confidence probability from e for Bonferroni type procedure 
based on Pearson measure. This fact shows that Bonferroni type procedure based on 
Pearson measure loses control of empirical confidence probability at nominal level P* 
for increasing value of e. 

On the Figure 4 dependence of empirical coverage probability for different value of 
threshold yo = 0.3, 0.5, 0.7, 0.9 from e for Pearson (left) and Kendall (right) measures 
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Figure 3: Dependence of empirical coverage probability from e (mixture=100e). 
yo = 0.4. Left - n = 250, right - n = 1000. 


of similarity is presented. This results shows dependence of stability of empirical 
confidence probability from true number of edges. Namely for the Pearson correlation 
empirical coverage probability is controlled for any € only for the case yo = 0,9. This 
is the case when there are no edges at all. Unlike Pearson correlation for Kendall 
correlation empirical coverage probability is controlled for any yo = 0.3, 0.5, 0.7,0.9 
for any € and does not depend from true number of edges. The same result is correct 
for sign (Fechner) network. 
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Figure 4: Dependence of empirical coverage probability from yo. Left - Pearson 
correlation, right - Kendall correlation, yo = 0.3, 0.5, 0.7, 0.9, n = 1000. 


8.2 Size of gap between upper and lower confidence bounds 
constructed by Bonferroni type procedure 


In order to investigate size of gap between upper and lower confidence bounds con- 
structed by Bonferroni type procedure we generate n(n > N) N-dimensional (N = 30) 
random vectors with the mixture distribution (20) and construct simultaneous upper 
and lower bounds for edges using Bonferroni type procedure. Obtained results are 
presented on Figures 5 - 7. 

On the Figures 5 dependence of sizes of simultaneous upper (|UCB(z, yo, P*)|) 
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and lower (|LCB(, yo, P*)|) bounds for yo = 0.3(left) and yo = 0.4(right), P* = 0.9 
from n in different networks for the case of normal distribution is presented. One 
can see that for Pearson and Kendall correlations upper and lower bounds are closer 
to the true number of edges than for sign correlation. It is important to emphasize 
that difference between sizes of simultaneous upper bounds in Pearson and Kendall 
correlation networks is not so high. The same result hold true for lower confidence 
bound also. 
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Figure 5: Dependence of sizes of simultaneous upper and lower bounds from n. 
yo = 0.3(left) and yo = 0.4(right), P* = 0.9. Normal distribution. 


Note that on Figures 2, 3 Bonferroni type procedures based on sign and Kendall 
correlation measures shows similar behavior. This is expected since sign and Kendall 
tests are robust. But this is not the case for the Figure 5 where Bonferroni type 
procedures based on sign and Kendall correlation measures shows different behavior. 
This could be explained by the fact that tests based on Kendall correlation have more 
power than tests based on sign correlation. 

On the Figures 6 dependence of gap size (14) from n for yo = 0.4, 0.5, 0.6, P* = 0.9 
and considered measures of similarity for the case of normal distribution is presented. 
One can see that for Pearson and Kendall correlations size of gap is lower than for 
sign correlation. Moreover the difference between gap sizes in Pearson and Kendall 
correlation networks is not so high despite from the fact, that for normal distribu- 
tion Bonferroni type procedure with Pearson correlation tests has some optimality 
properties (see Section 6). 

Let us investigate stability properties of gap size and empirical confidence proba- 
bility under deviation from normal distribution. On the Figure 7, (left) dependence 
of gap size (14) from e for n = 250; 2000, yo = 0.3, P* = 0.9 and Pearson measure 
of similarity is presented. On the Figure 7, (right) dependence of empirical coverage 
probability from e for n = 250; 2000, yo = 0.3, P* = 0.9 and Pearson measure of 
similarity is presented. One can see that gap size for the case of Student distribution 
is lower than for the case of normal distribution. But such result can be misleading, 
since it is necessary to pay attention to the fact that empirical coverage probability 
for the case of Student distribution is close to 0 in comparison with the case of normal 
distribution empirical coverage probability is close to 1. 
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Figure 6: Dependence of gap size (14) from n for yo = 0.4(upper left),7o 
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right) from e for n = 250; 2000, yo = 0.3, P* = 0.9. 
8 Y 


9 Conclusions 
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A method to construct upper and lower confidence bounds for the threshold similarity 
graph identification is proposed. The method is based on application of multiple hy- 
potheses testing procedures with FWER control in strong sense. Obtained confidence 
bounds allow to derive statistically significant conclusions on edges for the threshold 
similarity graph. In particular, it is possible to identify the statistically significant set 
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of edges included in the threshold similarity graph, and simultaneously the statistically 
significant set of edges not included in the threshold similarity graph. Note that an 
interest to the methods of statistically significant network analysis has been increased 
last decades [21], [6], [25]. 

To investigate stability of confidence probability and the size of gap between up- 
per and lower confidence bounds with respect to distribution three random variable 
networks are considered in the wide class of elliptical distributions. Obtained results 
show that for the case of normal distribution of the vector X best (in some sense) 
procedure to upper and lower confidence bounds construction is the Bonferroni type 
procedure based on Pearson correlation tests. From the other side this is not true 
for the case of deviation from normal distribution. Namely, it is shown that Bonfer- 
roni type procedure based on Pearson correlation tests does not control the confidence 
probability for the case of elliptical distributions with heavy tails. 

The interesting results concerning Kendall correlation network are obtained. Namely 
despite from the weak instability of Kendall correlation test in the class of ellipti- 
cally contoured distributions [19] the Bonferroni type procedure to upper and lower 
confidence bounds construction in Kendall correlation network is stable. Moreover 
the uncertainty of the procedure for the case of normal distribution is very close to 
the uncertainty of the Bonferroni type procedure based on Pearson correlation tests. 
Furthermore the uncertainty of the Bonferroni type procedure based on Kendall cor- 
relation tests is smaller than uncertainty of Bonferroni type procedure based on sign 
similarity tests. These experimental results allow to recommend the Bonferroni type 
procedure based on Kendall correlation tests for practical applications. 
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11 Appendix 


Proof of Lemma 3.1. 
Let U(x) = {(i, j) : yij(@) = 0, i< j} be the set of indexes pairs such that the 
hypotheses (4) are accepted. It is sufficient to prove the equivalence of the events 


U(x) > Je(0,y0) and [| (1 - gf 5(@)) =1 
(4,9) €Je (9,70) 
If 
U(x) D Je(9, Y0) 
then V(i,7) € Je(9, Y0) one has y7 ;(x) = 0 and therefore 


IT] a-es 
(4,5) € Je (9,70) 
If 
I] G-¢@)=1 
(4,5) €Je (9,0) 
then V(i,7) € Je(9, Y0) one has yj ;(x) = 0 and therefore (i, j) € U(x) by definition of 
the set U(x). 
Proof of Lemma 4.1. 
Let Un(x) = { (i,j) : yij(a) =0, i< j} be the set of indexes pairs such that the 
hypotheses (8) are accepted. 
Analogously to Lemma 3.1 one can prove 


Po (Un(x) D In(9,90)) 2 P*, VOEN 
Since L(x) = J \ Un(x) then 
Po (J \ Un(x) C J \ In(9,40)) 2 P*, VOER 


or 
Pa(L(x) C Je(9,70)) = P*, VOER 


Example 11.1. Let vector X = (Xı,..., Xn) has normal distribution N (p, A), P* = 
0.9, N = 5, yo = 0.1, n = 102. 


e If 


then Ue = 6.2156713. 
e If 


then Ue = 5.347538. 
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e If 


then Ue = 4.0667635. 


e If 
1 04 0.4 0.3 0.3 
0.4 1 0.4 04 0.3 
A= 0.4 0.4 1 0.4 0.4 
0.3 0.4 0.4 1 0.4 
0.3 0.3 0.4 0.4 1 


then Ue = 3.1762985. 


The values Ue are calculated using tests based on statistics TE; (18). Powers of these 
tests are close to the powers of uniformly most powerful invariant tests. Then values 
Ue demonstrate potential minimum zone of uncertainty as function of chosen matrix 
A, threshold yo and number of observation n. 
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